CPSC 545/445 (Autumn 2003) - Class 14: RNA structure Module 5, Part 1 --- 5.1 Introduction recap (from Module 1): RNAs have many functions in the cell - intermediate stages of protein synthesis (mRNA, tRNA) - catalytic role (ribozymes) - structural/catalytic role (rRNA) many of these functions crucially depend on the structure RNA World hypothesis: RNA existed before DNA and proteins in a world where organisms had RNA genomes that were replicated by RNA catalysts and where catalytic RNAs played many of the fundamental roles played now by proteins levels of RNA structure: - primary structur = base sequence (A,C,G,U) - secondary structure = base pairing within an strand (and between strands) - teriary structure = 3-dimensional structure of molecule secondary structure is one of the most important factors determining 3d-structure, and hence function here: focus on secondary structure experimentally determining RNA structure (even just secondary structure) is very time-consuming and labour-intensive => large incentive to use computational techniques instead -- 5.2 RNA secondary structure RNA strands form secondary structure by means of base-pairing (mainly Watson-Crick pairing AU, GC, but others are possible, e.g., GU). terminology: AU, GC = Watson Crick (WC) pairs GU = wobble pair WC or wobble = canonical pairs all others = non-canonical pairs Example: [slide] definition: the secondary structure S of an RNA strand s of length n is defined as the set of paired base positions in s such that each base in s is paired with at most one other base, i.e., forall (i,j) \in S and (i',j') \in S: i=i' iff j=j' Example: (structure from slide) (1,49), (2,48), (3,47), (4,46), (5,45), (7,18), (8,17), ... (55,69), (56,68), ... secondary structure elements: - stems = helices, composed of neighbouring base pairs = stacked pairs (stems form regular, A-form double helix) - loops: - bulges - internal loops - hairpin loops - multibranched loops (multi-loops) - pseudoknots stems, i.e., stacked base pairs, stabilise the structure by reducing the free energy of the molecule (low free energy = stable, see also below), while loops have a destabilising effect (increase free energy). Example for pseudo-knot: (see RNAse P slide) Formally, an RNA structure is pseudoknotted iff exist (i,j) \in S, (i',j') \in S: i compensatory mutations in paired base positions this can be exploited for predicting RNA secondary structure basic idea: - measure covariation between aligned sequence position to infer base pairs in RNA secondary structure comparative sequence analysis approach: given: k RNA homologous sequences s_1, ..., s_k 1. compute multiple sequence alignment of s_1, ..., s_k Repeat: 2. guess structure S from current alignment of s_1, ..., s_k 3. realign sequences s_1, ..., s_k based on current structure S Until no change in structure occured (during the last iteration) Note: - for Step 1 to work reasonably well, s1 ... sk must be sufficiently similar - for Step 2 to work reasonably well, s1 ... sk must be sufficiently different How to do Step 2? quantify covariation (degree of compensatory mutations in a position) using the concept of mutual information content: mutual information between columns i and j in alignment: M_ij = \sum_{x,y in {A,C,G,U}} f_ij(x,y) log_2 (f_ij(x,y) / (fi(x)*fj(y))) where f_i(x) = frequency of base x in column i f_j(y) = frequency of base y in column j f_ij(x,y) = joint frequency of x,y in columns i,j Example: AGCAAUUGCU AUCAAUUGAU AACAAUUGUU * * structure: (((....))) f_2(G)=1/3 f_9(G)=0 f_{2,9}(G,C)=1/3 f_{2,9}(C,G)=0 M_{2,9} = 1/3 * log_2 ((1/3) / (1/3*1/3)) * 3 = 1.585 M_{1,10} = 0 (these positions are perfectly conserved) Note: M_ij is always >= 0 and <= 2 Paired positions have high mutual information content. Exercise: give an example for a set of aligned RNA sequences for which M_{2,6} = 2 [slide: Figure 10.6] --- Resources: Durbin et al., Ch.10 Lecture notes from CSE 527, Winter 2000 (Martin Tompa) at U Washington, http://www.cs.washington.edu/education/courses/527/00wi/lectures/lect16.pdf http://www.bioinfo.rpi.edu/~zukerm/seqanal-old/node1.html#SECTION00010000000000000000 Further Reading: http://www.bioinfo.rpi.edu/~zukerm/rna/energy/ http://www.bioinfo.rpi.edu/~zukerm/rna/energy/node2.html -> turner free energy model ---